National Repository of Grey Literature 24 records found  1 - 10nextend  jump to record: Search took 0.00 seconds. 
Intelligent Data Scraping in a Web Browser
Maštera, František ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The goal of this thesis is to extract data from web pages without the knowledge of their internal structure. The point is to recognize the structure using an algorithm and a given input information about the content that the user wants to extract. The structure analysis is then followed by the content extraction itself. An average success rate of over 80% was achieved on selected sets of websites. The resulting algorithm represents a new approach to data extraction and can be deployed in the real world or can be a part of further development.
Extension of Apache Tika with Industrial File Formats Text Extraction
Rešetár, René ; Burget, Radek (referee) ; Rychlý, Marek (advisor)
The goal of the bachelor's thesis was to extend the parsers of the Apache Tika project with data and table extraction from industrial document formats from laboratory instruments. These data will be stored in a structured format according to a certain scheme. In the theoretical part, the supplied industrial formats, the Apache Tika project and the possibilities of its expansion were examined. In the practical part, a tool was designed and implemented, which classifies documents using the Apache Tika project, processes them, creates structured data from them in the JSON format and subsequently validates them. Finally, a set of tests was created to verify and demonstrate the properties of the solution.
Image object detection using template
Novák, Pavel ; Mašek, Jan (referee) ; Burget, Radim (advisor)
This Thesis is focused to Image Object Detection using Template. Main Benefit of this Work is a new Method for sympthoms extraction from Histogram of Oriented Gradients using set of Comparators. In this used Work Methods of Image comparing and Sympthoms extraction are described. Main Part is given to Histogram of Oriented Gradients Method. We came out from this Method. In this Work is used small training Data Set (100 pcs.) verified by X-Validation, followed by tests on real Sceneries. Achieved success Rate using X-Validation is 98%. for SVM Algorithm.
Sentiment Analysis in Automotive Industry
Bezák, Adam ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
The main theme of this thesis is to familiarize with the basic methods of sentiment analysis on social networks. Thesis’s theme is aimed on the automotive industry, although this prinicipal can be used in any different examined branch. The basis of the practical part is to obtain data from the social networks, analyze them and then index them into ElasticSearch database. Another goal of the thesis is to visualize these data by means of a web portal. Created web portal provides various statistics of the leading automobile brands, an overview of new trends or the aspect visualization of the individual cars.
Relationship between Changes in Betting Odds and Results of Football Matches
Jurkovič, Juraj ; Bartík, Vladimír (referee) ; Zendulka, Jaroslav (advisor)
The goal of this thesis is to demonstrate techniques for solving web scraping and knowledge discovery tasks. The case study is focused on the extraction of data from bookmaker websites and subsequent analysis of collected data. The thesis demonstrates the implementation of web scraping task in Python language. The thesis describes selected implementation details for developing such a system and proposes a database schema that can be used for this purpose. Collected data is analyzed using statistical methods and frequent patterns are discovered in odds movements using apriori algorithm. Discovered relationships and frequent patterns are presented to the end user.
Methods of Data Extraction from the Web
Perina, Lukáš ; Křivka, Zbyněk (referee) ; Burget, Radek (advisor)
The purpose of this bachelor thesis is to design an architecture and subsequent implementation of an application designed for data extraction (web scraping) from web documents. Unlike conventional methods, it is an extraction based on defining data types and regular expressions of requested elements. Extraction is executed in such a manner, where it is not necessary to know the detailed structure of given web document and the possibility of using just one definition to detect requested elements on different web pages. Algorithm is able to achieve overall accuracy of 85,51% and recall 80,28%. This approach can reduce the time required for analysis of web pages significantly and not to take the structure of the code as a determining factor while creating web scraping requests.
Environment for analyzing suspicious device
Procházka, Jan ; Martinásek, Zdeněk (referee) ; Malina, Lukáš (advisor)
This bachelor thesis focuses on a design of enviroment for analysis of a suspicious device. Such device may be for example a disc contaminated by malicious code or a mobile device. The aim of this work is to design an efficient and simple solution using open source products. The final designed environment should be capable of performing both surface and in-depth data analysis. The theoretical part offers an information related to the scope of addressed problem and includes terms such as Sandbox, Malware, Android. These are described from the point of view of understanding the analysis of malware occurring predominantly on mobile devices. The practical part describes the used hardware and software for the design of the environment and it contains examples of analyzes of the external devices contaminated by a malcode. These examples are mainly for Android mobile devices.
Portal for Aggregation of Data from Web Sources
Mikita, Tibor ; Křivka, Zbyněk (referee) ; Burget, Radek (advisor)
This thesis deals with data extraction and data aggregation from heterogeneous web sources. The goal is to create a platform and a functional web application using appropriate technologies. The main focus of the thesis is on the application design and implementation. The application domain is accommodation or lease of apartments. For the data extraction, we use the portal API or a wrapper. Obtained data is stored in a document database. In this thesis, we managed to design and implement a system that allows to obtain rental ads from multiple web sources at the same time and to present them in a uniform way.
Extrakcia informácií z formulárov
Pálinkás, Adam
This thesis is discussing designing and implementing application which is using advanced text recognition techniques and image processing techniques for processing scanned forms forms which were filled in by hand. Existing methods and techniques for text recognition are being analyzed. Chosen methods and techniques are implemented to create the final solution that streamlines form processing in CYRRUS, a. s.
Layout-based Data Extraction from Documents
Sedláček, Martin ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
This thesis deals with automated data extraction from medical reports in PDF format based on document layout analysis. The main content of the thesis is an introduction to data extraction, a comparison of existing tools and a presentation of the design and requirements of the developed tool, which will be based on the FitLayout application framework. The thesis then describes the actual implementation of the tool in Java and comments on the results achieved by the tool on real data.

National Repository of Grey Literature : 24 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.